I am checking your cleaning log against your clean data.
Out of your 25k entries, you have 12k correct, 4k not correct and 10k as NA.
NA’s
Let’s now just focus on the NA
dd %>%
filter(is.na(check_clean)) %>%
group_by(Question) %>%
tally() %>%
arrange(desc(n))
How-to read
dd %>% head()
Question: from your cleaning log
uuid: uuid
Old Value: from your cleaning log
New value: from your cleaning log
Reason: from your cleaning log
value_raw : value from raw dataset
value_clean: value from clean dataset
check_clean: New Value (from cleaning log) is the same as value_clean (from clean data)
ALL
logg %>% filter(Question == "All") %>% mutate(uuid_dup = duplicated(uuid))
there are 1681 survey deleted, there are 1676 in the deletion log.
there are 9 uuid duplicated in the cleaning log for deletion
action: none needed.
sanitation_features_other:
dd %>% filter(Question == "sanitation_features_other", is.na(check_clean))
it seems there are 10% of the household who does not have toilets.
action: to be check with the skip logic
main_source_water_other
dd %>% filter(Question == "main_source_water_other", is.na(check_clean))
dd %>%
filter(Question == "main_source_water_other", is.na(check_clean)) %>%
select(uuid, `New Value`) %>%
left_join(select(cleann, main_source_water, uuid))
## Joining, by = "uuid"
action: Explore what happened, was the “other” recoded?
when_arrived_current_location
dd %>% filter(Question == "when_arrived_current_location", is.na(check_clean)) %>%
View()
In your cleaning log, the “New Value” is empty, while there is a value in the clean dataset. No reason written
action: Explore what happened
when_leave_place_origin
dd %>% filter(Question == "when_leave_place_origin", is.na(check_clean)) %>%
View()
Same as above.
67"
dd %>% filter(Question == "67", is.na(check_clean))
Not sure what 67 variable is for, but same as above. In addition, all of them were turn to 67 while the value were different
Action: investigate on your side.
sanitation_facilities_problems_other
dd %>% filter(Question == "sanitation_facilities_problems_other", is.na(check_clean)) %>%
pull(Reason) %>% table()
## .
## Clarification from enumerator
## 339
## Clarification from enumerator + Translating to English
## 16
## other options recoded
## 4
## other options recoded in the choice list
## 22
## Translating to English
## 272
dd %>% filter(Question == "sanitation_facilities_problems_other", is.na(check_clean),
Reason != "other options recoded",
Reason != "other options recoded in the choice list")
dd %>%
filter(Question == "sanitation_facilities_problems_other", is.na(check_clean),
Reason != "other options recoded",
Reason != "other options recoded in the choice list") %>%
select(uuid, `New Value`) %>%
left_join(select(cleann, sanitation_facilities_problems, uuid)) %>%
select(`New Value`, sanitation_facilities_problems) %>% table()
## Joining, by = "uuid"
## sanitation_facilities_problems
## New Value latrines_too_far other other
## Big problems 0 2
## Dont have sanitation facilities 0 4
## Dont have toilet 0 216
## Dont have toilet, need to dig and build a toilet 0 2
## Dont have toilet, share with a family 0 2
## Go to open spaces 0 2
## Go to open spaces' 0 2
## Goto op[en spaces 0 2
## No , dont have toilet 0 16
## No sanitation facilities 0 4
## None of the above 0 2
## Only have and open bit 0 2
## open defication 0 1
## out toilet needs renovation 0 2
## There are no toilets 0 2
## There is no sanitation facility 0 22
## Use with neighbour 0 2
## We dont have toilet cleaning materials 0 0
## We go out at night 0 0
## we used open deficaiton 0 19
## Yes 0 2
## sanitation_facilities_problems
## New Value other facilities_too_crowded
## Big problems 0
## Dont have sanitation facilities 0
## Dont have toilet 2
## Dont have toilet, need to dig and build a toilet 0
## Dont have toilet, share with a family 0
## Go to open spaces 0
## Go to open spaces' 0
## Goto op[en spaces 0
## No , dont have toilet 0
## No sanitation facilities 0
## None of the above 0
## Only have and open bit 0
## open defication 0
## out toilet needs renovation 0
## There are no toilets 0
## There is no sanitation facility 0
## Use with neighbour 0
## We dont have toilet cleaning materials 0
## We go out at night 2
## we used open deficaiton 0
## Yes 0
## sanitation_facilities_problems
## New Value unclean_unhygienic other
## Big problems 0
## Dont have sanitation facilities 0
## Dont have toilet 0
## Dont have toilet, need to dig and build a toilet 0
## Dont have toilet, share with a family 0
## Go to open spaces 0
## Go to open spaces' 0
## Goto op[en spaces 0
## No , dont have toilet 0
## No sanitation facilities 0
## None of the above 0
## Only have and open bit 0
## open defication 0
## out toilet needs renovation 0
## There are no toilets 0
## There is no sanitation facility 0
## Use with neighbour 0
## We dont have toilet cleaning materials 2
## We go out at night 0
## we used open deficaiton 0
## Yes 0
Same as for main source of water. Check if recoding happened or should happen. Action: explore what happened when recoding other.
shelter_issues_other
dd %>%
filter(Question == "shelter_issues_other", is.na(check_clean))
dd %>%
filter(Question == "shelter_issues_other", is.na(check_clean))%>%
select(uuid, `New Value`) %>%
left_join(select(cleann, shelter_issues, uuid)) %>%
select(`New Value`, shelter_issues) %>% table()
## Joining, by = "uuid"
## shelter_issues
## New Value lack_bathing
## Its nice 0
## Its normal 0
## My house is safe 0
## no issue 0
## No problem 0
## No problems 0
## No, No problem 4
## None 0
## None of the above 0
## Nothing 0
## Thanks to Allah 0
## There is no issue 0
## There is no problem 0
## shelter_issues
## New Value lack_bathing lack_cooking lack_lights_inside lack_lights_outside
## Its nice 0
## Its normal 0
## My house is safe 0
## no issue 0
## No problem 0
## No problems 0
## No, No problem 0
## None 2
## None of the above 0
## Nothing 0
## Thanks to Allah 0
## There is no issue 0
## There is no problem 0
## shelter_issues
## New Value lack_bathing lack_cooking lack_lights_outside
## Its nice 0
## Its normal 0
## My house is safe 0
## no issue 0
## No problem 0
## No problems 0
## No, No problem 2
## None 0
## None of the above 0
## Nothing 0
## Thanks to Allah 0
## There is no issue 0
## There is no problem 0
## shelter_issues
## New Value lack_cooking lack_cooking other lack_lights_outside
## Its nice 0 0 0
## Its normal 0 0 0
## My house is safe 0 0 0
## no issue 0 0 0
## No problem 0 0 0
## No problems 0 0 0
## No, No problem 0 0 2
## None 2 0 0
## None of the above 0 0 0
## Nothing 0 0 0
## Thanks to Allah 0 0 0
## There is no issue 0 0 0
## There is no problem 0 0 0
## shelter_issues
## New Value lack_lights_outside lack_lights_inside
## Its nice 0
## Its normal 0
## My house is safe 0
## no issue 0
## No problem 0
## No problems 0
## No, No problem 2
## None 0
## None of the above 0
## Nothing 0
## Thanks to Allah 0
## There is no issue 0
## There is no problem 0
## shelter_issues
## New Value lack_lights_outside unsafe_bathing lack_privacy
## Its nice 0 0
## Its normal 0 0
## My house is safe 0 0
## no issue 0 0
## No problem 0 0
## No problems 0 0
## No, No problem 2 0
## None 0 2
## None of the above 0 0
## Nothing 0 0
## Thanks to Allah 0 0
## There is no issue 0 0
## There is no problem 0 0
## shelter_issues
## New Value lack_space lack_privacy lack_lights_outside lack_lights_inside unsafe_cooking
## Its nice 0
## Its normal 0
## My house is safe 0
## no issue 0
## No problem 0
## No problems 0
## No, No problem 2
## None 0
## None of the above 0
## Nothing 0
## Thanks to Allah 0
## There is no issue 0
## There is no problem 0
## shelter_issues
## New Value other other lack_lights_inside unsafe_bathing
## Its nice 2 0 0
## Its normal 2 0 0
## My house is safe 2 0 0
## no issue 0 0 0
## No problem 8 0 0
## No problems 6 0 0
## No, No problem 98 0 2
## None 52 0 0
## None of the above 2 0 0
## Nothing 2 0 0
## Thanks to Allah 4 0 0
## There is no issue 2 0 0
## There is no problem 32 0 0
## shelter_issues
## New Value unsafe_bathing other
## Its nice 0
## Its normal 0
## My house is safe 0
## no issue 0
## No problem 0
## No problems 0
## No, No problem 2
## None 0
## None of the above 0
## Nothing 0
## Thanks to Allah 0
## There is no issue 0
## There is no problem 0
It seems there is “no issues”. Should it be removed as you did then? Or removed? Action: investigate if re-coding correct.
hh_main_source_income_other
dd %>%
filter(Question == "hh_main_source_income_other", is.na(check_clean)) %>%
select(uuid, `Old Value`) %>%
left_join(select(cleann, shelter_issues, uuid))
## Joining, by = "uuid"
dd %>%
filter(Question == "hh_main_source_income_other", is.na(check_clean)) %>%
pull(Reason) %>%
table()
## .
## Anwser was choice list Clarification from choices
## 98 177
## Clarification from enumerator Clarification from translation
## 1 48
## irrelevent entries other options recoded in the choices
## 58 10
## other options recoded inthe choices reclasified
## 8 11
## response already in the list of choices
## 3
dd %>%
filter(Question == "hh_main_source_income_other", is.na(check_clean)) %>%
select(uuid, `Old Value`) %>%
left_join(select(cleann, starts_with("hh_main_source_income"), uuid))
## Joining, by = "uuid"
does Clarification from choices and Clarification from translation means re-classfied?
Action: Check if clarification were correclty re-coded/classified.
common_type_ids_other
dd %>%
filter(Question == "common_type_ids_other", is.na(check_clean)) %>%
select(uuid, `Old Value`, `New Value`) %>%
left_join(select(cleann, starts_with("common_type_ids"), uuid))
## Joining, by = "uuid"
Same as the the others “other”
Action: check recoding and action.
wrong cleaning value.
dd %>% filter(check_clean == F)
dd %>% filter(check_clean == F) %>% nrow()
## [1] 4376
re-coding FALSE and TRUE to 0/1
logg2 <- logg %>%
mutate(`New Value` = ifelse(`New Value` == "FALSE", 0, `New Value`),
`New Value`= ifelse(`New Value` == "TRUE", 1, `New Value`))
old_new_values2 <- mapply(old_new,
cleaning_log = split(logg2, row.names(logg2)),
variable = "Question",
MoreArgs = list(
data_raw = raww,
data_clean = cleann,
uuid_raw = "uuid",
uuid_clean = "uuid",
uuid_cleaning_log = "uuid"
),
SIMPLIFY = F) %>% do.call(rbind, .)
dd2 <- logg2 %>%
mutate(binding = paste0(uuid, Question)) %>%
left_join(old_new_values2) %>%
mutate(check_raw = `Old Value` == value_raw,
check_clean = `New Value` == value_clean) %>%
select(-c(binding, ID, `Follow-up`, Enumerator, Community, `Modified by?`, Notes, check_raw),
uuid, Question,`Old Value`, `New Value`, value_raw, value_clean, check_clean, Reason)
## Joining, by = "binding"
dd2 %>% filter(check_clean == F) %>% nrow()
## [1] 705
With changing the T/F we are reducing to 705 wrong values!
dd2 %>% filter(check_clean == F)
It seems "casuel _labour _Wages _construction _etc" has a typo.
dd2 %>% filter(check_clean == F,
`New Value` != "casuel _labour _Wages _construction _etc") %>% nrow()
## [1] 535
Down to 535.
dd2 %>% filter(check_clean == F,
`New Value` != "casuel _labour _Wages _construction _etc")
It seems some values are off.
Action: please check those.